Add support for Synthetics Private Locations for DDR in dd-sync-cli#526
Merged
melkouri merged 8 commits intoJun 8, 2026
Merged
Conversation
Resolves conflict in synthetics_private_locations.py: keep DDR include_pl_info=true query param while adopting main's new state.set_source() API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
…un dedup Previously synthetics_tests had skip_resource_mapping=True, so the sync had no way to detect a destination test that already existed when state didn't know about it. Under degraded destination APIs (504/5xx storms during McNulty pool exhaustion on staging), repeated sync runs were accumulating duplicate tests in R2: a prior sync's POST succeeded server-side, the client exhausted retries and never persisted the destination id, and the next sync's create branch POSTed again. This migration: - Replaces skip_resource_mapping with resource_mapping_key="metadata.disaster_recovery.source_public_id", the stamp pre_resource_action_hook already writes on every source test before create/update. - Adds a map_existing_resources() override that LISTs destination tests with include_metadata=true and indexes them by source_public_id, without the destination-versions side-effect that get_resources has. - Adds a short-circuit at the top of create_resource: if R2 already has a test stamped with this source_public_id, adopt it into state and call update_resource instead of POSTing a new copy. This does not address in-POST retry duplications (those happen mid-POST, after the existing_resources_map is built and frozen). Operational mitigation for staging-degraded conditions: --max-workers 20 --http-client-retry-timeout 180 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
TO DO: Add guideline to how to sync the Synthetics Private Locations for Customers |
Per reviewer feedback (Michael Richey, Ron Hay): the "delete only sync-cli- managed resources" change to reset is a broader semantic discussion that deserves its own PR and possibly its own command name. Moving the behavior change out of this PR so the DDR PL replication can be reviewed in isolation. The discussion will continue in a follow-up PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- PL cassettes (test_synthetics_private_locations and test_cli):
* Append ?include_pl_info=true to source PL GET URLs.
* Add pl_id + public_key_test to GET responses (consumed by
create_resource for ddr_metadata + test_encryption_public_key).
* Update POST bodies to include ddr_metadata.disaster_recovery and
test_encryption_public_key as JSON-stringified payload.
- synthetics_tests cassettes: pre-pend recorded GET LIST interactions
against the destination for the new map_existing_resources call,
with empty {"tests": []} response (no orphans to dedupe in the
test scenario).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
michael-richey
approved these changes
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
synthetics_private_locationsresource, enabling PL replication from a source org (R1) to a destination org (R2).Related dogweb PRs:
include_pl_infoRFC: https://docs.google.com/document/d/1pxwa2vqa5I_NkhWlhXlcuFsDgSbBVQNdeXFEQ1vzQSM/edit?tab=t.0#heading=h.tmwe8zpx1a8q
synthetics_testsoffskip_resource_mapping=Trueto use themap_existing_resourcesinfrastructure. Adds a cross-run dedup short-circuit so a sync run that finds a destination test already stamped with a knownsource_public_idadopts it into state instead of POSTing a duplicate. This guards against orphan accumulation when prior sync POSTs succeeded server-side but the client never received the response (504-after-success under destination API load).Description of the Change
New CLI option:
--datadog-host-overrideconstants.py(DD_DATADOG_HOST_OVERRIDEenv var),options.py(CLI flag), andconfiguration.py(threaded to theConfigurationdataclass).Updated
synthetics_private_locations.py:excluded_attributes: addedddr_metadata,pl_id, andpublic_key_test.ddr_metadatais returned on the destination for DDR PLs and must not be diffed against the source.pl_idandpublic_key_testare returned by the source-side GET (withinclude_pl_info=true) and are only used at create time, so they must not show up as diffs.import_resource(): callsGET /api/v1/synthetics/private-locations/{id}?include_pl_info=trueso the source state capturespl_idandpublic_key_testat import time. No extra fetch at sync time.create_resource(): when a source PL is being created at the destination:pl_id+public_key_testfrom the source state (already captured at import).nullmetadata from the request body (DDR endpoint rejects it).ddr_metadata.disaster_recoverywithsource_pl_idandsource_name. The dogweb schema is strict (additionalProperties: false) and accepts only these two fields.test_encryption_public_keyto the JSON-stringifiedpublic_key_testobject ({pem, fingerprint, id}).datadog_host_override.resp["private_location"]. When--datadog-host-overrideis set, it appears atprivate_location.config.datadogHostOverrideand ships in the regular destination state JSON.Updated
synthetics_tests.py:resource_config: replacedskip_resource_mapping=Truewithresource_mapping_key="metadata.disaster_recovery.source_public_id". That stamp is already injected on every source test bypre_resource_action_hook, so it doubles as the dedup key.map_existing_resources()override: fetches destination tests viaGET /api/v1/synthetics/tests?include_metadata=trueand indexes them bysource_public_id. Custom override avoids the destination-versions side-effect that the existingget_resources()has.create_resource()dedup short-circuit: before issuing a new POST, checks if a destination test already carries thissource_public_id. If yes, adopts it into destination state and callsupdate_resource()instead of POSTing. Prevents new duplicates when re-syncing after a prior orphan-creating run.Test fixture update: moved
synthetics_testsfromOPT_OUT_RESOURCEStoMAPPING_RESOURCESintests/unit/test_map_existing_resources.py.Operational guidance
When running against staging (
app.datad0g.com), throttle concurrency to avoid McNulty pool exhaustion (512 status codes; see Confluence):The default
--max-workers 100saturates dogweb's gunicorn pool on staging, producing 512s and 504s. The 504s are particularly bad on POST because dogweb may have written the resource server-side before timing out, and the client's retry then duplicates it. Throttling keeps the pool happy and eliminates this in-POST duplication path. Prod has more headroom and can usually run at the default.Known limitations
In-POST retry duplication: if a single POST attempt 504s and the client retries, dogweb may have already created the resource. Each retry creates another server-side row. The throttling guidance above is the operational mitigation; a proper fix (skipping 504 retries on POST in
custom_client.py) is out of scope for this PR.Verification Process
Tested end-to-end on staging (
app.datad0g.com) between two orgs with--max-workers 20 --http-client-retry-timeout 180:import: confirmed source state capturedpl_idandpublic_key_testviainclude_pl_info=true, and that every R1 test landed inresources/source/synthetics_tests.json.sync: confirmed PLs created via the DDR endpoint with the right schema, tests created in R2 withmetadata.disaster_recoverystamps, and destination state matches R2 counts after throttled runs.synthetics_private_locations_check_assignmentpopulated and the test began running on the replicated PL.syncafter dropping a destination state entry, and confirmed the dedup short-circuit adopts the existing R2 test instead of creating a duplicate.Follow-up
A separate PR will revisit the
resetcommand semantics ("delete only sync-cli-managed resources" vs current "delete everything in R2"). That conversation is out of scope here.Release Notes
synthetics_private_locations.--datadog-host-overrideCLI option for optional CNAME override during PL replication.synthetics_teststomap_existing_resourceswith cross-run dedup againstmetadata.disaster_recovery.source_public_id. Re-running sync after a degraded run no longer accumulates duplicate tests in the destination org.